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Abstract. The theorem of factorisation forests shows the existence of nested factorisations — a 
la Ramsey — for finite words. This theorem has important applications in semigroup theory, and 
beyond. The purpose of this paper is to illustrate the importance of this approach in the context 
of automata over infinite words and trees. 

We extend the theorem of factorisation forest in two directions: we show that it is still valid for any 
word indexed by a linear ordering; and we show that it admits a deterministic variant for words 
indexed by well-orderings. A byproduct of this work is also an improvement on the known bounds 
for the original result. 

We apply the first variant for giving a simplified proof of the closure under complementation of 
rational sets of words indexed by countable scattered linear orderings. We apply the second variant 
in the analysis of monadic second-order logic over trees, yielding new results on monadic interpre- 
tations over trees. Consequences of it are new caracterisations of prefix-recognizable structures and 
of the Caucal hierarchy. 

1 Introduction 

Factorisation forests were introduced by Simon [23]. The associated theorem — which we call the 
theorem of factorisation forests below — states that for every semigroup morphism from words 
to a finite semigroup S, every word has a ramseyan factorisation tree of height linearly bounded 
by |iS| (see below). An alternative presentation states that for every morphism (p from A + to 
some finite semigroup S, there exists a regular expression evaluating to A + in which the Kleene 
exponent L* is allowed only when (f(L) = {e} for some e = e 2 £ S; i.e. the kleene star is allowed 
only if it produces a ramseyan factorisation of the word. 

The theorem of factorisation forests provides a very deep insight on the structure of finite 
semigroups, and has therefore many applications. Let us cite some of them. Distance automata 
are nondeterministic finite automata mapping words to naturals. An important question con- 
cerning them is the limitedness problem: decide whether this mapping is bounded or not. It has 
been shown decidable by Simon using the theorem of factorisation forests [23]. This theorem 
also allows a constructive proof of Brown's lemma on locally finite semigroups [6]. It is also 
used in the caracterisation of subfamilies of the regular languages, for instance the polynomial 
closure of varieties in [17] . Or to give general caracterisations of finite semigroups [2UJ . In this 
last paper, the result is applied for proving McNaughton's determinisation results of automata 
over infinite words [TS]. In the context of languages of infinite words indexed by u, it has also 
been used in a complemetation procedure [5] extending Buchi's lemma pQ. 

The present paper aims first at advertising the theorem of factorisation forest which, though 
already used in many papers, is in fact known only to a quite limited community. The reason 
for this is that all of its proofs rely on the use of Green's relations: Green's relations form an 
extremely important tool in semigroup theory, but are technical and uncomfortable to work 
with. The merit of the factorisation forest theorem is that it is usable without any significant 



knowledge of semigroup theory, while it encapsulates nontrivial parts of this theory. Further- 
more, as briefly mentionned above and also in this paper, this theorem as already important 
applications to automata theory. This is why this theorem is worth being advertised outside the 
semigroup community as a major tool in automata theory. 

The technical contribution of the paper is an investigation of the potential use of factorisation 
forests in broader contexts than finite words. An important objective is to be able to apply this 
theorem on infinite words, and on trees instead of words. Those attempts are incarnated by 
two new variants of the theorem. As a byproduct we improve the known bounds of the original 
result (in particular on the previous improvement |13j). 

We also provide some applications of those results. We give a new proof of the result of 
Carton and Rispal showing the closure under comlementation of rational languages of words 
with countable scattered linear domain [TO] . We use the other variant of the theorem for proving 
a decomposition result for monadic interpretations (in fact the application of a technique that 
we call compaction). This yields new caracterisations of prefix recognisable structures and of 
the Caucal hierarchy. 

However, the applications of those results go beyond the one proposed here. In paricular, 
let us mention the work of Blumensath [3] who applies the deterministic variant of the theorem 
presented here for giving a new proof of Rabin's theorem [21]. The theorem of Rabin states that 
the monadic theory of the infinite binary tree is decidable. Different proofs have been proposed 
for this result so far, all relying on the use of automata theory, and most of them on the use of 
parity games (see [25] for a survey). For the simpler theory of the naturals with successor - 
originally proved by Buchi [T] — another proof technique is known: the compositional method of 
Shelah [23]. In this seminal paper, Shelah asks whether there exists a proof of Rabin's theorem 
along the same lines. Blumensath [4] answers to this longstanding open question positively. 

The content of the paper is organised as follows. Section [2] is dedicated to definitions. Sec- 
tion [3] present the original theorem of factorisation forests as well as two less standard presen- 
tations of it. We also introduce in this section the notion of a ramseyan split, which is central 
in the remainder of the paper. In Section 0] we provide the first extension of the theorem, the 
extension to all complete linear orderings. Section is dedicated to the application of this exten- 
sion to the complementation of automata over countable scattered linear orderings. In Section [6] 
we provide the second extension of the theorem, to ordinals only this time, but with an extra 
property of determinism. Finally, in Section [71 we develop the technique of compaction and use 
it for providing a new decomposition result for monadic interpretations applied to trees. We 
also show how this impacts on the theory of infinite structures. 

2 Definitions 

In this section, we successively present linear orderings, words indexed by them, semigroups and 
additive labellings. 

2.1 Linear orderings 

A linear ordering a = (L, <) is a set L equipped with a total ordering relation <; i.e. an 
irreflexive, antisymmetric and transitive relation such that for every distinct elements x, y in L, 
either x < y or y < x. A subordering [3 of a is a subset of L equipped with the same ordering 
relation; i.e. (3 = (L 1 , <) with L' C L. We write (3 C a. We omit the ordering relation < 
below unless necessary, and just say that L is a linear ordering. An convex subset of a is 
a subset S of a such that for all x, y £ S and x < z < y, z £ S. We use the notations 
[x, y], [x, y[, ]x, y],]x, y[, ] — oo, y], ] — oo, y[, [x, +oo[ and ]x, +co[ for denoting the usual intervals. 
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Intervals are convex, but the converse does not hold in general. Given two subsets X, Y of a 
linear ordering, X < Y holds if for all x G X and y £ Y , x < y. 

The sum of two linear orderings ot\ = {L\, <i) and oi<i = (L2, <2) (up to renaming, assume L\ 
and I/2 disjoint), denoted ot\ + 02, is the linear ordering [L\ U L2, <) with < coinciding to <i 
on L\, to <2 on L2 and such that L\ < L2. More generally, given a linear ordering a = (L, <) 
and for each x G L a linear ordering (5 X = (K x , < x ) (the K x are assumed disjoint), we denote 
by Ylxea ^ ne nnear (^xglK x , <') with x' <' y' if x < y or x = y and x' < x y', where x' € iCr 
and y' G if^. 

A linear ordering a is we// ordered if every nonempty subset has a minimal element. It is 
complete if every nonempty subset of a with an upper bound has a least upper bound in a, and 
every nonempty subset of a with a lower bound has a greatest lower bound in a. 

A cut in a linear ordering a = (L, <) is a couple (E,F) where {E, F} is a partition of L, 
and E < F. Cuts are totally ordered by (E,F) < (E',F') if E 1 C This order has a minimal 
element _L = (0, L) and a maximal element T = (L, 0). We denote by a the set of cuts over L and 
by a* the set a \ {X, T}. An important remark is that a and a* are complete linear orderings. 

Cuts can be thought as new elements located between the elements of L: given x G L, 
x~ = (] — 00, x[, [x, +00 [) represents the cut placed just before x, while x + = (] — 00, x], ]x, +oo[) 
is the cut placed just after x. We say in this case that x + is the successor of x~ through x. But 
not all cuts are successors or predecessors of another cut. A cut c is a right limit (resp. a left 
limit) if it is not the minimal element and not of the form x + for some x in L (resp. not the 
maximal element and not of the form x~). 

Two linear orderings a = (L, <) and (5 = (L ! , <') are isomorphic if there exists a bijection / 
from L onto L' such that for every x, y in L, x < y iff f(x) <! f(y). In this case, we also say 
that (L, <) and {L\ <') have the same order type. This is an equivalence relation on the class 
of linear orderings. We denote by uj, uj*, ( the order types of respectively (N, <) (the naturals), 
(— N, <) (the nonpositive integers) and (Z, <) (the integers). The order type of a well-ordering 
is called an ordinal. Below, we do often not distinguish between a linear ordering and its type. 
This is safe since all the construction we perform are isomorphism invariant. 

The interested reader can find in [22] additional material on linear orderings. 

2.2 Words 

We use a generalized version of words: words indexed by a linear ordering. Given a linear 
ordering a = (L, <) and a finite alphabet A, an a-word u over the alphabet A is a mapping 
from L to A. We also say that a is the domain of the word u, or that u is a word indexed by a. 
Standard finite words are simply the words indexed by finite linear orderings. Given a word u 
of domain a and (3 C a, we denote by u\p the word u restricted to its positions in j3. 

Given an a-word u and a /5-word v, uv represents the (a + /3)-word defined by (uv)(x) is 
u{x) if x belongs to a and v{x) if x belongs to (3. This construction is naturally generalized to 
the infinite product Y\ i€a Ui, where a is an order type and it, are linear Pi -words; the resulting 
being a J2iea A-word. 

2.3 Semigroups and additive labellings 

For a thorough introduction to semigroups, we refer the reader to |14|18|19] . A semigroup (S, .) 
is a set S equipped with an associative binary operator written multiplicatively. Groups and 
monoids are particular instances of semigroups. The set of nonempty finite words A + over an 
alphabet A is a semigroup - it is the semigroup freely generated by A. A morphism of semigroups 
from a semigroup (S, .) to a semigroup (S", /) is a mapping (p from S to 5' such that for all x, y 
in S, ip(x.y) = ip(x).'tp(y). An idempotent in a semigroup is an element e such that e 2 = e. 
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Let a be a linear ordering and (S, .) be a semigroup. A mapping a from couples (x,y) 
with x,y € a and a? < y to S is called an additive labelling if for every x < y < z in a, 
o-{x,y).a{y,z) = cr(x,z). 

Given a semigroup morphism from (A**, .) to some semigroup (S, .) and a word -u in ^4° 
of domain a, there is a natural way to construct an additive labelling (f> u from a to (S, .): for 
every two cuts x < y in a, set ip u {x,y) is ^(liLa,^). I.e. ip u (x,y) is the image by 93 of the factor 
of u located between x and y. We denote by ip* u the mapping restricted to a*. 

2.4 Structures, graphs, trees, logics 

Relational structures Let us first remark that the definitions presented here are useless before 
Section [6l have marginal consequences in Section El and are of real interest only for Section UJ 

A relational structure (U, R\, . . . , R n ) is a set IA, called the universe, together with relations 
R\, . . . , R n of fixed finite arity over U. Each relation R has a name that we write R itself. The 
signature of a structure contains the names involved together their arity. A graph is a relational 
structure for which the relations have arity 1 and one relation of arity 2. The elements of the 
universe are called vertices, the unary relations are called label relations, and the binary relations 
is called the edge relation. A path is a finite sequence of vertices such that two successive vertices 
are in relation by the edge relation. The first vertex is called the origin of the path, and the 
last vertex the destination. 

Linear orderings can be naturally represented as graphs: (L,<) can be seen as a graph of 
vertices L, with an edge between x and y iff x < y. For a linear ordering a = (L, <) and a finite 
alphabet A = {ai, . . . , a n }, an a- word u is the graph (L, <, a±, . . . , a n ) obtained from the graph 
of the linear ordering by setting to be interpreted as u~ 1 (ai); the set of positions in the word 
corresponding to letter aj. 

A tree t is a graph such that there is only one edge relation, called the ancestor relation and 
denoted Q, satisfying: 

— the relation C is an order, 

— there is a minimal element for C, called the root, 

— for every u, the set {v : v C u} is an ordinal of length at most uj. 

The vertices of a tree are called nodes. Maximal chains of nodes in a tree are called branches. 

Warning: The trees are not defined by a 'direct successor' relation, but rather by the 
ancestor relation. This has major impact on the logical side: all the logics we use below can refer 
to the ancestor relation, and it is well-known that first-order logic using this ancestor relation 
is significantly more expressive over trees than first-order logic with access to the successor of a 
node only. The results would fail if the ancestor relation was not used. 

A particular tree will play a special role below. The complete binary tree has as universe 
{0,1}*, as ancestor relation the prefix relation, and has two unary relations, = {0, 1}*0 
and 1 = {0, 1}*1. We call the relation the left-child relation, while 1 is the right-child relation. 
We denote by A2 the complete binary tree. 

One constructs a tree from a graph by unfolding. Given a graph G and one of its vertices v, 
the unfolding of G from v is the tree which has as nodes the all paths with origin v , as ancestor 
relation the prefix relation over paths, and such that a path it is labelled by a in the unfolding 
iff its destination is labelled by a in the graph. 

Logics For defining first-order logic, we need to have at our disposal a countable set of first-order 
variables x,y, . . . to pick from. The atomic formulas are R{x\, . . . , x n ) for xi, . . . , x n first-order 
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variables and R the name of relation of arity n; given two first-order variables x,y, x = y 
is also an atomic formula. First-order logic formulae are made out of these atomic formulae, 
combined with the boolean connectives V, A,->, and the first-order quantifiers 3x and Vx. For 
monadic logic, we need furthermore a countable set of monadic variables X, Y, . . . Monadic 
(second- order) formulce are defined as first-order formula?, but further allow the use of monadic 
quantifiers 3X, VX, and of a membership atomic formula x £ X, where x is a first-order variable 
and X a monadic one. For first-order as well as monadic formulae we use the standard notion 
of free variables. A formula without free variables is called a closed formula. 

We denote by S \= (ft the fact, for a closed formula (ft and a structure S, that the formula is 
true over the structure S. The formal definition uses the standard semantic, the value of first- 
order variables ranging over elements of the universe of the structure, while monadic variables 
take as values subsets of the universe. We say that S is a model of (ft, or that (ft is satisfied 
over S. When the structure is obvious from the context, we simply state that (ft is satisfied. We 
also allow ourselves to use formulas like 4>(xi, . . . , x n ) to denote that the free- variables of (ft are 
among {x\, . . . , x n }. Then given elements ui, . . . , u n in the universe of a structure S, we write 
S \= (fi(ui, . . . ,u n ) if the formula (ft is true over the structure S, using the valuation which to 
each Xi associates U{. 

A relational structure S has a decidable L -theory (where L is either first-order or monadic), 
if there is an algorithm which, given a formula (ft of the logic L, answers whether S \= (ft or not. 

Interpretations An interpretation is an operation defined by logic formulae that defines a 
structure inside another one. An interpretation is given as a tuple 

2 = (5{x), (fti(xi, . . .,x\ Rl \), (ft k {xi, . . -,x\ Rk \)) 

where 5{x),(ft\{x\, . . . , x\ Rl \), . . . ,(ftk(xi, . . . ,x\r k \) are formulae of corresponding free variables. 
The interpretation is first- order if the formulas are first-order and monadic if the formulae are 
monadic. 

Given a structure S of universe U, I(S) is the structure of universe 

U T{S) = {ueU : S h S(u)} , 
and such that the interpretation of R{ is 

\R~ I 

{(ui, ■ ■ • ,U\Ri\) E U 1(S) '■ S N <t>i{v>i, . . . ,u\ Ri \)}. 

A special case of interpretation is the marking. A marking replicates the structure, and adds 
some new unary relations on it. 

3 Factorisation forest theorem: various presentations for the standard case 

In this section, we present the theorem of factorisation forest. We first give the original statement 
in Section f3. II The in Section [321 we provide another equivalent presentation in terms of regular 
expressions; possibly the most natural one. In Section [3.3} we introduce the notion of a split, 
and use it for a third formalisation of the result. This notion is the one used in the extensions 
of the factorisation forest theorem we provide below. 

3.1 Factorisation forest theorem 

Fix an alphabet A and a semigroup morphism (p from A + to a finite semigroup (S, .). A fac- 
torisation tree of a word u G A + is an ordered unranked tree in which each node is either a leaf 
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Fig. 1. A factorisation tree 

labeled by a letter, or an internal node, and such that the word obtained by reading the leaves 
from left to right (the yield) is u. The height of the tree is defined as usual, with the convention 
that the height of a tree restricted to a single leaf is 0. A factorisation tree is ramseyan (for ip) 
if every node 1) is a leaf, or 2) has two children, or, 3) the values of its children are all mapped 
by (p to the same idempotent of S. 

Example 1. Fix A = {0, 1, 2, 3, 4}, (5, .) = (Z/5Z, +) and <p to be the only semigroup morphism 
from A + to (S, .) mapping each letter to its value. Figured] presents a ramseyan factorisation tree 
for the word u = 210232300322002 (u is the yield of the tree). In this drawing, internal nodes 
appear as horizontal lines. Double line correspond to case 3 in the description of ramseyanity. 

The theorem of factorisation forests is then the following. 

Theorem 1 (factorisation forests). For every alphabet A, finite semigroup (S, .), semigroup 
morphism ip from A + to S and word u in A + , u has a ramseyan factorisation tree of height at 
most 3\S\. 

The original theorem is due to Simon |24j . with a bound of 9\S\. An improved bound of 7\S\ is 
provided by Chalopin and Leung [13]. The value of 3|5| is a byproduct of the present work. 

3.2 A variant via regular expressions 

The use of factorisation trees gave the name of factorisation forests to the theorem. But it is 
sometime very convenient to use another formalisation in terms of regular expressions. This 
presentation is new (to the knowledge of the author), but its simplicity makes it worth to be 
advertised. Let A be an alphabet, <p a semigroup morphism from A + to some semigroup S, and 
E be a regular expression over the alphabet A. E is ip-ramseyan if for each occurence L* of the 
Kleene star in E, L is mapped to {e} by ip, for e an idempotent in S. 

Example 2. Let S be Z/2Z with the addition, A be {0, 1} and (p be the morphism from A + 
to S sending each letter to its value modulo 2. The expression 0(0 + 10*1)* + 10*1(0 + 10*1)* 
is 99-ramseyan and evaluates to </? _1 (0). 

Theorem 2 (variant of factorisation forests). For every alphabet A, finite semigroup (S, .), 
semigroup morphism <p from A + to S and x in S, there exists a ip-ramseyan regular expression 
E x evaluating to p~ 1 (x). 

Proof. By induction on k, for every x in S, let the </?-ramseyan regular expression E k be: 

El = p~\x) n A , E k x +1 = E k x +Yl E v E * + E ■ 

yz=x e 2 =e=x 

On can show by induction on k that for all x £ S, E x evaluates to the set of words in ip~ 1 (x) 
possessing a factorisation tree of height k. This proof, for both directions of the inclusion, is a 

3 1 S\ 

direct application of the definitions. Then, by Theorem [1] E x evaluates to 9? -1 (x). □ 
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The interest of Theorem [5] is that it allows to perform proofs by induction on the structure 
of ramseyan regular expressions. By the following refinement, we can derive complexities when 
using this technique. 

Property 1 (refinement of Theorem The height of the regular expression E x is at most 
3151 + 1, counting for the operator +, and 1 for the concatenation, the Kleene star and 
constants. The regular expression E x contains at most 6|S| 2 distinct subexpressions, at most 
3|5[ 2 distinct subexpressions without the +-operator at the root. 

Those bounds are obtained from the last variant, Theorem [3j 
3.3 A variant via ramseyan splits 

The third equivalent presentation to the theorem of factorisation forests uses the notion of 
ramseyan splits. One way to see a split is as a form of presentation of a tree. This formal- 
isation naturally extends to infinite words, and is very natural to use in automata theoretic 
constructions. The extensions of the theorem proposed in the remaining of the paper use this 
definition. 

A split of height N of a linear ordering a is a mapping s from a to [1, N]. Given a split, two 
elements x and y in a such that s(x) = s(y) = k are k-neighbours if s(z) > k for all z € [x,yj. 
^-neighbourhood is an equivalence relation over s~ 1 (k). Fix an additive labelling from a to some 
finite semigroup S. A split of a is ramseyan for a — we also say a ramseyan split for (a, a) 
- if for every k £ [1,-W], every x < y and x' < y' such that all the elements x,y,x',y' are 
^-neighbours, then a(x,y) = a(x',y') = (a(x,y)) 2 ; Equivalently, for all k, every class of k- 
neighbourhood is mapped by a to a single idempotent of the semigroup. 

Example 3. Let S be Z/5Z equipped with the addition +. Consider the linear ordering of 17 
elements and the additive labelling a defined by: 

|3|1|0|2|3|2|3|0|0|3|2|2|0|0|0|2| 

Each symbol '[' represents an element, the elements being ordered from left to right. Between 
two consecutive elements x and y is represented the value of o~(x, y) G S. In this situation, the 
value of a(x, y) for every x < y is uniquely defined according to the additivity of a: it is obtained 
by summing all the values between x and y modulo 5. 

A split s of height 3 is the following, where we have written above each element x the value 
of s(x): 

13221212223211112 
|2|1|0|2|3|2|3|0|0|3|2|2|0|0|0|2| 

In particular, if you choose x < y such that s(x) = s(y) = 1, then the sum of elements between 
them is modulo 5. If you choose x < y such that s(x) = s(y) = 2 but there is no element z in 
between with s(z) = 1 — i.e. x and y are 2-neighbours — the sum of values separating them is 
also modulo 5. Finally, it is impossible to find two distinct 3-neighbours in our example. 

Theorem 3. For every finite linear ordering a, every finite semigroup (S, .) and additive la- 
belling a from a to S, there exists a ramseyan split for a of height at most \S\. 

The proof of this result is postponed to Section T4.21 as the proof is a simplification of the proof 
of its extension Theorem HI 

Let us state the link between ramseyan splits and factorisation trees. Fix an alphabet A, 
a semigroup S, a morphism (p from A + to S and a word u € A + . The following is easy to 
establish: 
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— every ramseyan factorisation tree of height k of u can be turned into a ramseyan split of 
height at most k of , 

— every ramseyan split of height k of <*p* u can be turned into a factorisation tree of height at 
most 3k of u. 

Using this last argument and Theorem [3J we directly obtain a proof of Theorem [T] with the 
announced bound of 3\S\. Using similar arguments, one obtains the bounds of Property HJ 

4 Extension of the factorisation forest theorem to infinite words 

The contribution of this section is an extension of Theorem [3] to complete linear orderings. 

Theorem 4. For every complete linear ordering a, every finite semigroup (S, .) and additive 
labelling a from a to S, there exists a ramseyan split for (a, a) of height at most 3\S\ (\S\ if a 
is an ordinal). 

Compared to Theorem [3l we trade the finiteness — which is replaced by the completeness - 
for a bound of 3|5| — which replaces a bound of \S\. The special case of a being an ordinal, 
proves Theorem [3j 

The remaining of the section is devoted to the proof of Theorem [H as well as its ordinal 
version, Theorem [3l We start in Section 2J] by establishing some elementary topological lemmas 
relative to complete linear orderings. Then, in Section \&.2\ we give successively a proof of both 
Theorems [3] and HI 

4.1 On linear orderings 

The subject of this section is to provide preparatory lemmas on linear orderings. Namely Lem- 
mas [U and El This Section is not relevant for the simpler proof of Theorem [3j 

We consider here a binary relation R over a linear ordering a. The statement R(x, y) can be 
thought as meaningful only for x < y, in the sense that we do not take into account the value of 
R elsewhere. We say that a binary relation R over a is upward closed if for every x < x' < y' < y, 
R(x',y') implies R(x,y). 

Lemma 1. Let a be a complete linear ordering, and R be an upward closed relation over a. 
There exists such that for every x < y in a, 

— if R(x,y) then [x,y] PI 7 is nonempty, 

— if]x,y[D^ contains two distinct elements, then R{x,y). 

Let us first remark that if Lemma [T] holds for some linear ordering a, then it is also true 
for every convex subset of a. For this reason, we can safely add a new minimal element U 
and maximal element T' to a, such that for every x in a, R(-L',x) and R(x,T'). Define now 
for x £ a, 

l(x) = sup{y : \/z > x. R(y, z)} , 
and r(x) = m.i{z : Vy < x. R(y,z)} . 

Thanks to the adjunction of _L' and T', I and r are defined everywhere but for the minimal and 
maximal elements respectively. 

Fact 2. The following holds. 
1. Both I and r are nondecreasing. 
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2. For every x, l{x) < x < r(x). 

3. For every x, l{x) = x iff r{x) = x. 

4- For every x, r u (x) = sup{r n (x) : n € N} and l u (x) = inf{/ n (a;) : n £ N} are fixpoints of 
both I and r. 

5. For every x,y,z, if x < z < r(z) < y then R(x,y). 

6. For every x,y,z, if z < x < y < r(z) then —*R(x, y). 

Proof. Items 1,2,5 and 6 follow from the definition. 

For item 3. By upward closure of R, l(x) = x iff for every y < x and z > x, R(y,z), iff 
r(x) = x. 

For item 4. Let y = r w (x). By item 2, we have y < r(y). We have to prove r(y) < y. Let x n 
be r n (x). If x n+ \ = x n for some n, then y = x n = r(x n ) = r(y). Else xq < x\ < ■ ■ ■ < y. It 
follows by definition of r that for all n, R(x n ,y). This implies r(y) = y. □ 

We can now prove Lemma [TJ 

Proof. Set Fix to be the set of fixpoints of r (equivalently, I). Define the equivalence relation ~ 
byx~yifx = y£ Fix or [x,y] Pi Fix is empty. This relation induces two kind of equivalence 
classes: singletons consisting of a single fixpoint, or maximal intervals containing no fixpoint. 

Let C be an equivalence class of ~. If C = {x} for x £ Fix, set 7(C) to be C. Else, C is an 
interval. Fix an element xq in C, set Xq to be r n (xc) for n > and x~^ n be l n (x) for n > (both 
definitions coincide for n = with = xc)- By induction and using fact EJ one easily shows 
that for every n, both Xq and x~^ n belong to Fix U C. Let 7(C) be {x^ : n S Z, x^ £ Fix}. 
According to the previous remark 7(C) C C. 

We now define 7 to be the union of 7(C) for C ranging over equivalence classes of ~. Let 
us prove that this 7 satisfies the conclusion of the lemma. 

Let x < y be in a such that ]x,y[fl7 contains two distinct elements. If ]x,y[ contains two 
elements x' < y' nonequivalent for ~, there is a fixpoint in [x',y'] C]x,y[. It follows by Fact [2] 
that R(x,y). Else ]x,y[ is included in some equivalence class C of ~. Thus, the two elements 
in ]x,y[ are of the form Xq and Xq for n < m. Since Xq < < Xq, = t^Xq) belongs 
to ]x,y[. By Fact El R(x,y). 

Let x < y be in L such that R(x,y). If x 9^ y then by definition Fix n [x,y] is nonempty. 
And since Fix C 7, [x, y] D'j is nonempty. Else x ~ y. Let C be the equivalence class containing 
both x and y. If xq £ [x, y], then xc witnesses the nonemptyness of 7 PI [x, y]. Else either x > xc 
or y < xq- The two cases are symmetric. Let us treat the case x > xc- By Fact El r u (xc) £ Fix, 
and as xc ~ x, x < r u {xc)- Hence, there exists some n in N such that Xq = r n {Xc) > x. 
Let n be the least such natural. We have x^f 1 < x, and by monotonicity (Fact E]) x r c < r(x). 
Overall Xq G [x,r(x)]. Furthermore by Fact El r(x) < y. This witnesses x"q G 7 PI [x,y]. □ 

We will also require the following lemms0- 

Lemma 3. For every linear ordering a and every natural k, there exists a mapping c : a — > 
{0, . . . ,k— 1} swc/i t/iai /or every x < y in a with c{x) = c{y), c([x,y]) = {0, . . . , k — 1}. 

Proof. Let [A;] denote {0, . . . , fc — 1} We first show the result for a dense linear ordering (3. 
Consider the set M of partial mappings c from f3 to [k] such that for every x < y with c(x) = c(y) 
defined, either c is injective when restricted to [x,y[, or c([x,y}) = [k]. Those mappings are 
ordered by c C d if the domain of d contains the domain of c, and c coincides with d over 
its domain. Consider now a chain (cj)j e / of elements in M. It has an upper bound b defined 

In fact, the weaker result needed is the existence of a mapping c : a — > {0, . . . , fc — 1} such that for all a; < y 
in a with c(a;) = c(y) = 0, c([x, y]) = {0, . . . , k — 1}. It happens to be much easier to establish than Lemma[3] 
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by b(x) = Ci(x) if there is some i such that Cj(x) is defined, else b(x) is undefined. It is easy 
to check that b belongs also to M. By Zorn's lemma, there exists a maximal element m in M. 
Assume m is not defined in say, x. Let Y be the set of elements y such that m is not defined 
over [min(ar, y), max(x, y)]. By definition, x E Y. There are four cases depending on whether x 
is the minimal (resp. the maximal) element of Y . If x is neither the minimal nor the maximal 
element, this means there exists y < x < z in Y. By density, we can construct a (^-indexed 
growing sequence (xi)iez included in Y. Define then m' to coincide everywhere with m, but 
over the Xj's, where m'(xi) is set to be the remainder of i modulo k. By construction m' belongs 
to M, contradicting the maximality of m. If Y is [x], set m' to coincide everywhere with m but 
for x, where m!{x) = 0. Once more, ml belongs to M, this time by remarking that every value 
in [k] is mapped by m infinitely close to the left and to the right of x. This contradicts the 
maximality of m. The other possibilities for Y are just combinations of the two above. Hence m 
has to be defined everywhere, which means by density of [3 that the conclusion of the lemma 
holds for every dense linear ordering. 

At this point, the easiest way to conclude the proof is to prove for every n in [k] and every 
scattered nonempty linear ordering (3, that there exists a mapping cg n satisfying the conclusion 
of the lemma, such that c^ l n (n) is nonempty. This can be easily done with the help of Hausdorff 's 
theorem (see e.g. chapter 5 in |22j). Then, one uses the fact that every linear ordering a is a 
dense sum of scattered linear orderings (Theorem 4.9 in |22j), i.e: 

a = fix with 7 dense, and all the (3 X are scattered and pairwise disjoint. 

Then, using the case of a dense linear ordering above, we have a mapping d from 7 to [k] 
satisfying the conclusion of the lemma. Define now c over a by c(x) = cqmb){ x ) for (3 G 7 
with x £ (3. This mapping c fulfills the conclusion of the lemma. □ 



4.2 Proof of the statement 

We assume here the reader used to standard semigroup theory, and in particular Green's rela- 
tions. The reader can refer to [14)18|19] for a presentation of the subject. Some definitions and 
facts are presented below. 

Below, a denotes the additive labelling from the complete linear ordering a to the finite 
semigroup (S, .) of Theorem HI We denote by a subordering of a. We slightly abuse the nota- 
tion, and write ((3, a) for (/?, a\p) in which a\p is the additive labelling obtained by restricting a 
to f3. We also denote by a(f3) the set {cr(x, y) : x < y, x,y G f3}. 



Facts about finite semigroups and Green's relations 

We recall some definitions here, and gather some standard facts concerning finite semigroups. 

Given a semigroup S, S 1 denotes the monoid S itself if S is a monoid, or the monoid S 
augmented with a new neutral element 1 otherwise, thus making S a monoid. 

The Green's relation are defined by: 



a 


<c 


b 


if 


a 


= cb for some c 


in S 1 


a Cb 


if 


a 


<C b and b <c a 


a 


<n 


b 


if 


a 


= be for some c 


in S 1 


aTZb 


if 


a 


<TZ b and b <-ji a 


a 


<j 


b 


if 


a 


= cbc for some 


c, c mo 


ajb 


if 


a 


<j y and b <j a 


a 


<n 


b 


if 


a 


<£ b and a <n 


b 


aUb 


if 


a 


C b and aTZb 



Fact 4. Let a, b, c be in S. If a Lb then ac Cbc. If aTZb then ca TZ cb. For every o, b in S, 
a C cTZb for some c iff aTZ c' C b for some c' . 
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As a consequence of the last equivalence, one defines the last of Green's relations: 

aVb if a £ cTZb for some c in S 
if alZ c Cb for some c in S 

The key result being (here the hypothesis of finiteness of S is mandatory): 
Fact 5. V=J. 

For this reason, we refer from now on only to T> and not J . However, we will use the preorder <j 
(which is an order over the P-classes). 

An elemement a in S is called regular if asa = a for some s in S. A D-class is regular if all 
its elements are regular. 

Fact 6. A V-class D is regular, iff it contains an idempotent, iff every C-class in D contains 
an idempotent, iff every IZ-class in D contains an idempotent, iff there exists a,b in D such 
that ab E D. 

Fact 7. For every a,b in D such that ab G D, a 7Z ab and b C ab. Furthermore, there is an 
idempotent e in D such that a C e and blZ e. 

Fact 8 (from Green's lemma). All Tt-classes in a T> -class have the same cardinality. 

Fact 9. Let H be an H-class in S. Either for all a, b in H , ab H ; or for all a, b in H , ab £ H , 
and furthermore (H, .) is a group. 

Case of a group ?i-class. 

Lemma 10. Let H be an TL-class in S such that (H, .) is a group, and (3 be such that o~{(3) C H. 
Then there exists a ramseyan split of height at most \H\ of(f3,o~). 

Proof. Since (H, .) is a group, it is natural to extend the definition of a over (3 in the following 
way. For every x, let a(x,x) be 1#, the neutral element of the group (H, .); for every y < x 
in P, let a(x,y) be a(y,x)~ 1 , the inverse of a(x,y) in H. As expected, this extended version 
of a satisfies for every x, y, z in (3, a(x, z) = a(x, y)o~{y, z). Let n be a mapping numbering the 
elements of H from 1 to \H\. Fix an element xq in 0. Let s be defined for all x by s(x) = 
n(a(x ,x)). 

Let us show that s defined this way is indeed a ramseyan split for a. Let x < y be 
such that s(x) = s(y), then a(xo,x) = a(xo,y) since n is a bijection from H onto [1, \H\]. 
Hence o~(x,y) = a(x,xo)a(xo,y) = a(xo,x)~ 1 a(xo,y) = 1r- Hence, given x < y and x' < y' 
pairwise fc-neighbours, then a(x,y) = 1h = cr(x',y') = 1 2 H . □ 

Case of a regular X>-class. 

Lemma 11. Let D be a regular D-class in S, and (3 be such that c(/3) C D. Then there exists 
a ramseyan split of height at most \D\ of {(3, a). 

Proof. For every x € j3 nonmaximal, set r(x) to be the 7£-class of o~(x, z) for some z > x; 
this value is independant of the choice of z according to Fact [71 Similarly, for every x in f3 
nonminimal, set l{x) to be the £-class of o~(y, x) for some y < x. If (3 has a maximal element M, 
choose r(M) to be such that l(M)C\r(M) is a subgroup of S; this is possible according to Fact[6l 
Similarly if (3 has a minimal element m, choose l(m) such that l(m) n r(m) is a subgroup of S. 
Set for all x in (3, h(x) = l(x) n r(x). 
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We claim that for every x in /3, h(x) is a subgroup of S. Indeed, if x is either the minimal or 
the maximal element of 0, this follows from the definition of r(M) and l(m). Else, there exists 
y,z such that y < x < z. Let a be a(y,x) G l(x) and b be a(x,z) G r(x). By Fact since 
ab = a(y, z) G D, there exists an idempotent e in D such that a C e and b 1Z e; i.e. e € /i(x). 
And by Fact [9l /i(x) is a subgroup of S. The claim holds. 

According to Fact EJ there is a natural number N such that all "H-classes included in D have 
cardinal N. Let H\, . . . ,Hd be the W-classes included in D which are subgroups of S. For k 
in {1, . . . ,d}, set (3 k to be {x G (3 : = -fffc}. By fact [3 o-((3 k ) ^ By Lemma [TOl there 

exists a ramseyan split for ((3 k , a) of height at most \H k \ = N. 

We set now for all x in /?, s(x) to be kN + Sfc(x) where A; is such that x G (3 k . Let us 
establish that s is a ramseyan split for (/?,<r). Let x < y and a/ < y ; be such that s(x) = 
s(y) = s(x') = s(y'). By definition of s, x,y,x',y' belong to the same (3^- Furthermore, since 
s{x) = s{y) = s(x') = s(y'), we have s k (x) = s k (y) = s k (x') = s k (y'). Hence, by ramseyanity 
of s k over ((3 k , a), a(x, y) = a(x' , y') = a(x, y) 2 . We conclude that the mapping s is a ramseyan 
split for a). Its height is bounded by dN < \D\. □ 

The general case for ordinals: proof of Theorem [31 

For this last part of the proof, one has to provide factorisations on ordinals where the 
minimal value has ben removed. Without this, one does not obtain the bound of \S\ announced. 
Hence, given a linear well-ordering (3, one denotes by (3 the linear ordering (3 \ {0/j}. 

Lemma 12. Let E C S be a T>-closed subset of S and (3 Q a be such that o~{(3) C E. Then 
there exists a ramseyan split of height at most \E\ of ({3, a). 

Proof. The proof is done by induction on the size of E. If E is empty, then (3 contains at most 
one element. Hence (3 is empty. We can give a split of height over the empty linear ordering. 

Else, let D be a minimal 2?-class in E (for the <^7-order). Let 7 C (3 be the least set 
satisfying: 

— O/3 G 7, where O/3 is the minimal element of /3, 

— if x G 7 then min{y > x : o~(x, y) G D} G 7. 

It is not difficult to check that the following fact holds. 

Fact 13. For every x,y in (3, if]x,y] is empty, then a(x,y) D. If [x,y] [I7 contains two 
elements, then a(x,y) G D. 

Define the equivalence relation ~ over (3 by x ~ y, if ]x, y] n 7 = for x < y and closed under 
reflexivity and symmetry. Let n be an equivalence class for ~. By Fact I13| o~(n) (ID = 0. Hence, 
one can apply the induction hypothesis and obtain a ramseyan split for (?), a) of height at 
most \E\ — \D\. Remark that f] = 77 \ 7. 

At this point, two cases may happen depending on the regularity of D. If D is not regular, 
then 7 contains at most 2 elements, Indeed, assume x < y < z in 7, then a(x,y),a(y, z) 
and a(x, y)cr(y, z) = a(x, z) belong to D. By Fact El D would be regular. A contradiction. 
Define s^ over $ by s(x) = 1 for x G 7, else s(x) = s^x) + 1 for 77 the equivalence class of x. 
This split is ramseyan since the value 1 is used at most once (in 7), and the ramseyanity is 
inherited from the induction hypothesis elsewhere. By induction hypothesis, this split has height 
at most \E\ - \D\ + 1 < \E\. 

Finally, if D is regular. We have (7(7) C D. By Lemma [TT] we obtain ramseyan split s 7 of 
height at most \D\ for (7,0"). Then define s over (3 by s(x) = s 7 (x) for x G 7, else s(x) = 
\D\ + s v (x) for 77 the equivalence class of x. It follows from the definition that s is a ramseyan 
split of 0, a) of height at most \E\ - \D\ + \D\ = \E\. □ 
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We can now conclude the proof of Theorem [3J 

Proof. Given an ordinal a, and an additive labelling a from a to S. Fix a value ao in S, construct 
the linear ordering a' = 1 + a, where 1 is a linear ordering containing the single element 0. 
Set a'(x,y) for x < y in a to be a(x,y), and set a'(0,y) to be aQ.a(0 a ,y). Defined like this, a' 
is an additive labelling from a' to S. By Lemma [T2| there exists a ramseyan split s for (o/, a') 
of height at most \S\. By construction of a' and <t', s is also a ramseyan split for (a, a). □ 

The general case for complete orderings: proof of Theorem 31 

Theorem [5] follows directly from the following lemma, with E = S. 

Lemma 14. Let E C S be a V-closed subset of S and ft Q a be complete and such that o~{(3) C 
E. Then there exists a ramseyan split of height at most 3\E\ of (f3,cr). 

Proof. We assume wlog that (3 is nonempty. The proof is done by induction on the size of E. 
Let D be a minimal P-class in E (for the <j--order). We define a binary relation R over (3 by: 
for every x < y in /3, R(x, y) if cr(x, y) £ D. Since D is a minimal P-class, this relation is upward 
closed; we can apply Lemma [T] and obtain a set 7 satisfying its conclusion. 

Define the equivalence relation ~ over /3 \ 7 by x ~ y, if [x, y] n 7 = for x < y and closed 
under reflexivity and symmetry. Let 77 be an equivalence class for ~. By Lemma [1] from which is 
obtained 7, a(rj)f]D = 0. Hence, one can apply the induction hypothesis and obtain a ramseyan 
split s v for (77, a). At this point, two cases may happen depending on the regularity of D. 

If D is not regular, then 7 contains at most 2 elements (same argument as in the case 
of a being an ordinal. Let us treat the case of 7 containing two elements xq < x\ (the case 
of 7 being empty or a singleton can be deduced from it). The equivalence ~ has at most three 
equivalence classes, 77 = (— oo,xo[, 77' =]xo,a7[, and 77" =]xi, +00). We can apply the induction 
hypothesis with 17(77) C E\D (resp. o~(rj') C E\D and (7(77") C E\D) and obtain a ramseyan 
split s^ for (77,0") (resp. s^ for (rj',a) and s^t for (77", a)) of height at most 3(\E\ — \D\). We 
construct s over (5 by s(x) = s r] (x) + 2 if a; € 77, s(xo) = 1, s(x) = s^(x) + 2 if a; € 77', s(x\) = 2, 
and s(x) = s v "(x) + 2 for x € 77". It follows from the definition that s is a ramseyan split of (/?, a) 
of height at most 3(\E\ - \D\) + 2 < 3\E\. 

Else, if D is regular, we apply Lemma [3] on 7 with k = 3 and obtain a mapping c : 7 — > 
{0,1,2} satisfying the conclusions of Lemma [3l By Lemma [TJ cj(c~ 1 (0)) C D. We can apply 
Lemma [TT] to c _1 (0), obtaining a ramseyan split s' for (c -1 (0),<7) of height at most \D\. Let 2 
be in /?, we define 



Let us first remark that the values corresponding to the first case of the definition range 
in [1, \D\] (def. of s'). The values of the second case lie in [\D\ + 1, \D\ + 2] by construction. 
Finally, the values provided by the last case lie all in [\D\ + 3, \D\ + 2 + 3(\E\ — \D\)], which is 
included in [\D\ + 3, 3\E\]. 

We have to prove the ramseyanity of s. Let x < y and x' < y' be pairwise fc-neighbours for 
some k. If k £ [1, \D\], we are in the first case of the definition of s, and a(x,y) = o~(x',y') = 
a(x,y) 2 by ramseyanity of s'. If k £ [\D\ + 1, \D\ + 2], then c(x) = c(y) and by LemmaEl there 
is some z in ]x,y[ with c(z) = 0. This implies s(z) < \D\, contradicting the 'fc-neighbourity' 
of x and y. Finally if k > |D| + 3, since x, y, x' and y' are ^-neighbours, they all lie in the same 
~-equivalence class 77. And a(x,y) = a(x',y') = a(x,y) 2 by ramseyanity of s v . □ 




if x G 7 
if x £ 7 
if x 7 



and c(ic) = 
c(x) £ {1,2} 

and 77 is the ~-equivalence class of x. 
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5 Application to countable scattered linear orderings 

In this section, we use Theorem H] for giving a new simplified proof of Theorem [5] (known 
from [10] ) . We first briefly recall some facts about scattered linear orderings in Section 15,11 and 
define the corresponding notions for words. Then we introduce automata on countable scattered 
words in Section 15.21 and the corresponding algebraic definition of a o-semigroup in Section 15.31 
In Section f5.4l we prove Theorem [5l 

This section is independant from the subsequent ones. 

5.1 Scattered linear orderings 

A linear ordering a is dense if for every x < y in a, there exists z in ]ic,y[. A linear ordering 
is scattered if it is not dense on any subordering. For instance (Q, <) and (R, <) are dense, 
while (N, <) and (Z, <) are scattered. Being scattered is preserved under taking a subordering. 
A scattered sum of scattered linear orderings also yields a scattered linear ordering. Every 
ordinal is scattered. Furthermore, if a is scattered, then a is scattered. And if a is countable 
and scattered, then a is also countable and scattered. 

Given an alphabet A, we denote by A° the set of words indexed by a countable scattered lin- 
ear ordering. Given a language L C A°, L u represents the set of words of the form : * ^ w l 
where all the Uj's belong to L. One defines similarly L _aJ and Lfc. 

A standard way for proving results on scattered linear orderings is to use the theorem 
of Hausdorff (chapter 5 of [22] is dedicated to the subject). It establishes a general way of 
decomposing scattered linear orderings. Hausdorff 's theorem is a key tool in the original proof of 
Theorem[5] [lOj. We avoid it below; instead, we use the following lemma which provides a kind of 
induction principle for scattered linear orderings. It essentially says that an equivalence relation 
such that any two sets of equivalent elements are contiguous (there is nothing in between) are 
equivalent, then the relation contains is trivial. 

Lemma 15. Given a scattered linear ordering a and an equivalence relation R over a satisfying: 
for all X < Y, with X 2 C R, and Y 2 C R, Q ]x, y[= implies (X U Y) 2 C R ; 

xeX, y& 

Then R = a 2 . 

Proof. Consider the set S of equivalence relations included in R such that every equivalence class 
is convex. It is nonempty since the equality relation over a belongs to S. Order S by inclusion. 
Given a chain in S, the union of all relations in the chain is itself an element of S: the chain has 
an upper bound in S. Then, according to Zorn's lemma, there is a maximal element ~ in S. 
Since a is scattered and ~E S, is itself a scattered linear ordering. Assume that it has two 
distinct equivalence classes. Since a/ ^ is scattered, there are two equivalence classes X and Y 
- choose wlog X < Y — such that there is no other equivalence class Z with X < Z < Y. 
This follows that n x& x, yeY]x,y[= 0. Applying the hypothesis leads to (X U Y) 2 C R, and 
consequently (~ L)(X U Y) 2 ) € S. It contradicts the maximality of ~. □ 

5.2 Automata over countable scattered linear orderings 

In this section, we define priority automata and show how they accept words indexed by count- 
able scattered linear orderings. Those automaton were introduced in [7j, but in their 'Muller' 
form, while here we adopt the 'parity-like' approach. 
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Definition 1. A priority automaton A = (Q, A, I, F,p,5) consists of a finite set of states Q, 
a finite alphabet A, a set of initial states I, a set of final states F, a priority mapping p : Q ^ 
[1, N] (N being a natural) and a transition relation 8 C (Q x A x Q) W ([1, N] x Q) l±l (Q x [1, N}). 

A run of the automaton A over an a-word u is a mapping p from a to Q such that for all 
cuts c, c': 

— if c' is the successor of c through x, then (/?(c), /j(c')) G 5, 

— if c is a left limit, then (k,p(c)) G 5 where k = max f") p^Qc', c[)), 

c'<c 

— if c is a right limit, then (/c(c), k) € 5 where k = max f] p(p(]c, c'[)). 

c'>c 

The first case corresponds to standard automata on finite words: a transition links one state 
to another while reading a single letter in the word. The second case verifies that the highest 
priority appearing infinitely close to the left of c corresponds to a transition. The third case is 
symmetric. An a-word u is accepted by A if there is a run p of A over u such that p(-L) G / 
and p(T) G F. 

Example 4- Consider the automaton with states {q,r}, alphabet {a}, initial states {q,r}, final 
state q, priority mapping constant equal to and transitions {(q, a, q), (q, a, r), (0, q), (r, 0)}). It 
accepts those words in {a}° which have a complete domain. For this, note that a linear ordering 
is complete iff no cut is simultaneously a left and a right limit. 

Consider a word u G {a}° which has a complete domain a. For c G a, set p(c) to be q if c 
is T or if c has a successor, else p(c) is r. Under the hypothesis of completeness, it is simple to 
verify that p is a run witnessing the acceptance of the word. Conversely, assume that there is a 
run p over the a-word u with a not complete. There is a cut c G a which is both a left and a 
right limit. If p(c) is r, then, as c is a left limit, there is no corresponding transition; else if p(c) 
is q the same argument apply to the right of c. In both cases there is a contradiction. 

The languages accepted by priority automata are closed under union, intersection, concate- 
nation, projection and exponentiation by to and —oj [7]. They also admit an equivalent form of 
regular expressions [7] and their emptyness problem is decidable. A consequence of Theorem [5] 
below is their closure under complementation (originally proved in [TO], in [9] for a particular 
case) . 

5.3 On o-semigroups 

Finite semigroups are known to have the same 'expressive power' as finite state automata. 
This approach has been extended to languages of cj-words while introducing w-semigroups 
in [TO] . Then Bedon and Carton generalized it to words indexed by countable ordinals in [2] , the 
corresponding algebraic object being called an wi-semigroup. Finally, Carton and Rispal have 
introduced o-semigroups for describing languages of words indexed by scattered linear orderings. 

Formally, a o-semigroup (s, tt) is a set equipped with an operator 7r mapping S° to S which 
satisfies: 

— for all s G S, ir(s) = s, and, 

— for all countable scattered linear ordering a and families {ui)i &a of words in S°, 

7r (Y\{ 7T ( u i) '■ i & a}) = TT([\{ui : % G a}) . 

Those properties express the fact that tt is a generalized product operator: more precisely, the 
rules correspond to a generalized form of associativity. For instance, for every u, v, w in S, 
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tt(utt(vw)) = tt(uvw) = tt(tt(uv)w). In this sense, every o-semigroup can be seen as a semigroup 
with the product defined by u.v = ir(uv). The free o-semigroup generated by a finite alphabet A 
is(vlMl). 

Given two o-semigroups (S,tt) and (S',tt'), a mapping (p from S to S' is a morphism of o- 
semigroups if for every scattered linear ordering a, and every (xi)i &a in S, ^p(^(Y\{xi : I G a})) = 
7r'(]^[{(^(xi) : / G a}). A language K C A° is o -recognizable if there exists a morphism of o- 
semigroups from ^4° to a finite o-semigroup saturating K; i.e. such that if" 1 ((p(K)) = K. As 
usual with recognizability, o-recognizable languages are closed under union, intersection and 
complementation. 

From now, we denote ir(uv) simply by uv . More generally, given a word u in S°, we do 
not distinguish between u and ir(u). Similarly, we abbreviate vr(J3{n : i G (N, <)}) by u w 
and 7r(n{ii : i G (— N, <)}) by u~ w . We also denote by the value u~ u u w . 

Example 5. Consider the set S = ({0, 1} x {0, 1}) ttl {-L}. Define the product . and the exponent 
mappings uj and —uj by, for every x in S and a, 6, a', b' in {0, 1}, 

_Lx = x_L = _L (a, &)(a', 6') = 

ir = (i, i) w = ± (a,6) w = 

J.-" = (l,l)- w = ± (a, 6)^ = 

Using Theorem 10 in [TO], this (S 1 , .) together with the mappings uj and — defines uniquely a 
o-semigroup (5, 7r). 

Let u be in {a}° of domain a. Set to be _L if a is not complete. If a is complete, set (f(u) 
to be (a,b) where a = if a has a minimal element, else a = 1, and 6 = if a has a maximal 
element, else 6=1. This 93 is a morphism from ({a}°,n) t° (S,tt). It follows that the set of 
words in {a}° of complete domain is o-recognizable: it is equal to tp^ 1 ({0, 1} x {0, 1}). 



_L 


if 6 = 


= a' = l 


(a, b') 


else 




_L 


if a = 


6 = 1 


(a,l) 


else 




_L 


if a = 


6= 1 




else. 





5.4 Equivalence of representations 

The following theorem was proved in |loffi A direct consequence of it is the closure under 
complementation of the languages of words indexed by scattered linear orderings accepted by 
priority automata. 

Theorem 5 ( [10J ) . Let A be a finite alphabet. A language L C A° is accepted by a priority 
automaton if and only if it is o-recognizable. 

The left to right implication is standard: one constructs a o-semigroup which captures all the 
possible behaviours of the automata over a word. Then there is no choice on the definition of 
the product and the morphism. 

The difficult direction is, given a o-recognizable language, to construct a priority automaton 
accepting it. The contribution here is to show that a natural way of constructing such an 
automaton is to follow the structure of a ramseyan split. Let us fix a o-semigroup (S, ir) and 
a morphism of o-semigroups 99 from to (S,ir). By closure of priority automata under 

union, it is sufficient to show that for every c G S the language <^ -1 (c) is acepted by a priority 
automaton. 

2 In fact, the present theorem differs in the use of priority automata in place of automata using Muller condition 
in limit transitions. For this reason the result here is new; but for a nonessential reason. 
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Let A; be a natural number, set L k to be the set of words u such that ip u admits a ramseyan 
split of height at most k. We show by induction on k that for every c G S, the language L C)k = 
Lfc n</9 -1 (c) is accepted by an automaton. According to Theorem[3]we have <p (c) = L c ^\gi. We 
also use the intermediate language SD{e, k) for e an idempotent of S which is the set of words u 
of domain a admitting a ramseyan split s of height at most k, such that s(_L a ) = s(T a ) = 1 
and <p(u) = e (in particular, SD(e, k) C L ejk ). 

The following lemma reduces the problem from describing the language L Cjk to describing 
languages of the form SD(e, k). 

Lemma 16. Let u G A° be a word of at least two letters. Thenu belongs to L c ^ k+ \ iff there exists 
a, b, e in S and 7 G {0, 1, to, —u, (} such that e 2 = e, c = ae^b and u G L a>k (SD(e, k + l)) 1 L\,^ k 
(with the convention that xy°z = xz). 

Proof. Prom left to right. Let u be an a-word in A° of length at least 2, and let s be a ramseyan 
split of height at most k + 1 of (a, <p u ). We argue on the nature of s _1 (l). 

If is empty, then choose arbitrarily a cut c in a*, and set a new value of 1 to s(c). 

This modified s is still a ramseyan split of height k + 1 of (a, And we can apply the next 
case for which is a singleton. 

If s -1 (l) is a singleton {c}, let v be u restricted to positions to the left of c, and w be u 
restricted to positions to the right of c. Obviously u = vw, and we have u G L^M^e L^ TO u for 
any idempotent e. 

Else s _1 (l) contains at least two elements. There are four cases depending on the existence 
of a minimal (resp. a maximal) element in s _1 (l). First case. If s _1 (l) has both a minimal 
element c and a maximal element c', then let a = v? u (-L, c), e = 9? M (c, c'), and 6 = (p u (c' , T). By 
definition of a ramseyan split, e is an idempotent of S; furthermore, (p(u) = aec. We obtain u G 
L a: kSD(e,k + l)Lb j fc. Second case. If s _1 (l) has neither a minimal element nor a maximal 
element. Let c be inf(s _1 (l)) and c' be sup(s~ 1 (l)). Let a = <p u (-L, c), 6 = c^ u (c',T). Using 
the countability of a*, we have a ("-indexed sequence ■ • • < x n < x n+ \ < ■ ■ ■ in s _1 (l), such 
that inf{xj : i S (} is c, and sup{xj : i £ (} is c'. Let e be a^)- The sequence of 

XiS shows that ip u (c,c') G (SD(e,k + 1))^. Furthermore e is an idempotent. We obtain u G 
L a ,k(SD(e, k + l))** L^f.. The two other cases are obtained as combinations of the two first one, 
using uj and — w-indexed sequences. □ 

This lemma together with the closure properties of languages accepted by priority automata 
shows that it is sufficient to construct an automaton accepting SD(e, k + 1). For this, define the 
following languages: 

M e<k = {u G L k \ {e} : tp(u) = e}, M^ k = {u G L k : ^(u)e~ u = e}, 

M^T = {ueL k : e>(«)e- u = e}, M e l = {u G L fe : e"^(u) = e}. 

Those languages can be obtained as unions of the L a ^ k together with languages consisting of 
a single letter word, or the empty word. Hence, by induction hypothesis there are automata 
accepting them. We identify below the automaton and the language. 

In order to accept the language SD(e, fc+1), we construct a corresponding automaton A(e, k+ 
1). The definition of the automaton A(e, k+1) is depicted in Figure [2j This is a disjoint union of 
the automata accepting M e>k , M^~ k , M^ k and M^ k ^ and of a new state t of priority n; the state t 
being both initial and final. The value n is chosen to be the highest priority of the automaton. 
New e-transitiona^| are added to this construction as depicted in Figure arrow arriving from 
the left have the initial states of the automaton as destination, while the arrows leaving to the 

3 e-transitions are just a commodity notation. And in particular there is no cycle of such transitions. 
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Fig. 2. The automata A(e, k + 1) 



right have the final states of the automaton as origin. Dashed arrows represent limit transitions. 
For instance the leftmost one expresses the existence of a limit transition (n, q) for q an initial 
state of M^ k : the automaton can go to state q if the maximal priority appearing infinitely often 
to its left is n. The following lemma concludes the proof. 

Lemma 17. The automaton A(e, k + 1) accepts the language SD(e, k + 1). 

Proof. From right to left. Let u be a word indexed by a. Let s be a ramseyan split of <p u 
corresponding to the membership of u in SD(e, k + 1), i.e. such that s(_L) = s(T) = 1. 

We construct a run p E Q a in the following way (Q is the set of states of A e ^ + i). Set p(x) = t 
whenever s(x) = 1. We define p elsewhere by copying runs of the automata E e< k, M^~ k , M^ k 
and M^~. More precisely, consider a maximal interval I Qa such that s(I) > 2. Let us define p 
over /. Four cases happen depending on the nature of the interval: / = [x, y], [x, y[, }x, y] or ]x, y[. 
We treat the case of [x, y[. The others are similar. 

If / = [x,y[, this means that s(x) > 1, but s(y) = 1. As a consequence, there is a sequence 
x\ < X2 < ■ ■ ■ in s~ 1 (l) indexed by u> such that supjxj : i < w} = x (this is possible because a 
is countable). It follows that a(x±,x) = e w . Furthermore (by ramseyanity) , a(x±,y) = e. We 
deduce e w a{x } y) = e. By induction hypothesis, we obtain that v is accepted by M^* k . We 
define p to replicate the corresponding run over / using the instance of M^ k it contains. We 
have to prove that this choice indeed produces a run. Over ]x,y[ this is a correct run since the 
original run was itself correct. It remains to show the correctness of the run to the left of x. 
But, we already know that the maximal priority reaching x from the left is n since the sequence 
of the Xj's tends to x and by construction correspond to a priority n which is maximal. We 
conclude that there is a corresponding transition in J 4 e ^. + i. 

From left ro right. Let p G Q a be a run of A(e, k + 1) over u from t to t. We aim at 
constructing a ramseyan split s of ip u corresponding to the membership of u in SD(e, k + 1). 
Let J be p -1 (t). We set s(x) to be 1 over J. Let / be a maximal interval which does not 
intersect J. Once more there are four cases: / = [x,y], [x,y[,]x,y] or ]x,y[. We treat the case 
of [x,y[. The others being similar. 

If I = [x,y[, this means that s(x) > 1, but s(x) = 1. Let q be the state p(x). Since / is 
maximal, there exists an iu-sequence x\ < x-i < . . . in J of limit x. Since p(xi) is n by definition, 
this means that the maximal priority appearing infinitely often to the left of x is n. Hence, 
there must be in A e ^+i a limit transition from n to q. By inspecting the definition of A e ^+i, 
this means that q is either the initial state of M^ k or the initial state of . In y, the run 

assumes state n, but this state has been reached by an e-transition either from the final state 
of M e or by the final state of Let p' be this state. We know that there is a run of A e ^ + i 
from configuration (x,q) to (y,p r ) which does not visit state n (by definition of I). It follows 
that q is the initial state of M^* k _ 1 , p' is its final state and that the run from (x, q) to (y,p') is 
an accepting run of M^ k . By induction hypothesis, = (fu)\i has factorisation height at 

most k. Let s' be this factorisation. For all x E I, let s(x) be s'(x) + 1. 
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Let us show that this split is ramseyan. Let x < y be such that s(x) = s(y) = k. For 
/c-neighours with k > 2, this is inherited from the induction hypothesis. What remains to be 
shown is that for every x < y in J (i.e. x, y are 1-neighbours), o~(x,y) = e. To make this 
relation reflexive and symmetric, we consider the relation R defined by xRy if x = y or x < y 
and a(x,y) = e or y < x and a(y,x) = e. We want to apply Lemma [T5l on (J, <) and the 
relation R. Let X, Y C J be such that X < Y, X 2 C R, Y 2 C i? and n^gx^er]^, 2/[nX = 0. 
Let / = n^gjcj/ey]^ y[> ^ is a maximal interval nonintersecting J. 

Once more there are four cases: / = [x, y], [x, y[, ]x, y] or ]x, y[. We treat the case of / = [x, y[. 
Fix xq G X and yo G Y. We want to prove a(xo, yo) = e. As x J, there is an w-sequence xo < 
xi < . . . of limit x with for all i, a(xi, Xj+i) = e. It follows that o"(xo, x) = e u . By construction s 
corresponds to a run of M^* k over /. It follows, by definition of M^ k , that e UJ a(x,y) = e. We 
obtain a(xo,y) = e. Since furthermore by hypothesis, o"(y,yo) = e, we have <r(xo,yo) = e. 

Lemma [T5l concludes that for every x < y in J, a(x,y) = e. Hence, s is a ramseyan split 
for ip u . □ 

6 Deterministic extension to the factorisation forest theorem 

We try in this section to construct the split from 'left to right' in a 'deterministic way'. The 
notion of ramseyanity is not suitable anymore in this context; the result would be falsqj. It is 
replaced by the notion of forward ramseyanity. The result, Theorem only holds for ordinals. 

6.1 The statement 

A split s of height N is forward ramseyan if for every k = 1 . . . n and ^-neighbours x < y 
and x' < y' , 



So in particular, a(x, y) is an idempotent, but a(x, y) and cr(x', y') may be different idempotents. 
In the terminology of Green's relation, a(x,y) and o~(x',y') are /^-equivalent idempotents. A 
ramseyan split is always forward ramseyan, but the converse does not hold in general. 

Below, we also identify the natural numbers with the corresponding ordinal. Furthermore, 
for a an additive labelling over an ordinal a, and given (3 < a, we denote by o-\<p the labelling a 
restricted to [0, j3]. 

Theorem 6. Let (S, .) be a semigroup. To every additive labelling a over an ordinal a, one 
can associate a forward ramseyan split s a ^ a of (a, a) of height at most \S\. Furthermore, for 
every additive labellings a and a' over the respective ordinals a and a' , and every ordinal 
(3 < min{a, a'}, 



Furthermore, under the same hypothesis, over finite linear orderings, the forward ramseyan 
split can be computed via monadic formulae. 

Proposition 1 (definable variant of Theorem [B]). Given a finite semigroup (S,.), there 
exist monadic closed formula Q\, . . . , ©igi such that for every ordinal a, and additive labelling a 
from a to S, the split s defined for every (3 G a by: 



a(x,y) = a(x,y).a(x',y') . 



if cr|</3 = <r'\<p then 



(determinism property) . 



n such that ((3 + 1, o\<p) \= 0, 



is forward ramseyan. 



4 Consider the semigroup ({a, b}, .) defined by ab — aa — a and ba = bb — b. 
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Proof. (Idea) Implement the construction of the proof of Theorem \6\ via monadic formula?. □ 
Note, that a consequence of Proposition [H the mapping s satisfies the determinism property. 

6.2 Proof of Theorem M 

Once more, we perform a case analysis. 

Case of a single ?i-class. 

Lemma 18. Let H be an Ti-class in S such that (H, .) is a group. For /?C« such that o~{(3) C 
H, there exists a ramseyan split s^ a of height at most \H\. Furthermore s H satisfies the deter- 
minism property. 

This is exactly the proof of Lemma [TU] in which one always chooses xq to be Or. 
Case of a single £-class. 

Lemma 19. Let L be an C-class in a regular T>-class, for every ordinal f3 such that o~{(3) C L, 
there exists a ramseyan split Sp a of height at most \L\. Furthermore s L satisfies the determinism 
property. 

We require the following result. 

Fact 20. There is an H-class H C L which is a group, and a mapping f : L — > H such that: 

— for every a,b in L, if ab G L then f(ab) = f(a)f(b), and, 

— for every TL- class H' C L, f\jj< is a bijection from H' onto H . 

Proof. Let Hi, . . . , H n be the ^-classes included in L. By Fact [9] we can assume that Hi, . . . , Hj. 
are groups, while for every a,b in K{ for i > k, ab L. By regularity hypothesis and Fact El 
k > 1. Let L' = Hi U • • • U H k . 

Let a, b be in L, we claim that ab G L iff b G L' . Indeed, if b G V ', let e be the neutral 
element of the group containing b. Since e C a, e = xa for some x. Hence, b = eb = xab, and we 
deduce ab C b. Conversely, suppose ab in L, then ab 1Z a. Hence, a = abc for some c. But then 
abcbc = a. Hence be belongs to L'. But be 1Z b. Hence b G L' . 

Let H be Hi. If k = 0, then for all a, b in L, ab L. One can construct the mapping arbitrarily 
using Fact El Else, let be the neutral element of Hi for i < k. Let i,j < k. Since e« C ej, 
ei = xej for some x. Hence e^ej = xejej = xej = e^. For every a G L, let /(a) = ae\. According 
to the claim above, / is a mapping from L to Hi. Assume a, b in L such that ab G L. According 
to the claim, above, b G L', i.e. b £ Hi for i < k. Also, as a C ei, a = xei for some x. We 
have f(a)f(b) = aeibei = xeieibei = xeibei = abei = f(ab). 

The fact that /|^ is a bijection from Hi to Hi is known as Green's lemma. □ 

We can now prove Lemma [T9l 

Proof. Let H and / be obtained by Fact I2UI For x < y in (3, let a'(x,y) be f(o~(x,y)). The first 
property of / makes a' an additive labelling from (3 to H, such that o~{(3) C H. Applying the 
case of a single 7i-class above we obtain a split s¥ a , forward ramseyan for {(3, a'). There are 
two different cases. 

Either all the 7^-classes are groups. In this case, one sets s^ a to be s¥ a , . Let us show that 
s L is forward ramseyan. Indeed, consider x < y and x' < y' to be ^-neighbours for some k. This 
means that f(o~(x,y)) and f(o~(x',y')) are equal to the neutral element 1 of H. Since the H- 
class of a(x, y) (resp. of a(x' , y')) are groups isomorphic to H, we have that a(x, y) and a(x', y') 
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are idempotents of S. Since a(x,y) C a(x',y'), a(x,y) = aa(x',y') for some a S S. Hence, 
a(x,y)a(x',y') = aa(x',y') 2 = aa(x',y') = a(x,y). 

Else, if there exists a non-regular W-class in L. This means that L contains at least two 
"H-classes. Define s L a (0p) = 1, and s^ a = s^ a ,(x) + 1 elsewhere. The split s L defined this way 
is forward ramseyan for (J3, a) as above. It has height at most \H\ + 1 < 2\H\ < \L\. 

And this construction satisfies the determinism property. □ 

Case of a single X>-class. 

Lemma 21. Let D be a regular D-class. For every ordinal (3 such that o~(/3) C D, there exists 
a ramseyan split s® a of height at most \D\. Furthermore s D satisfies the determinism property. 

Proof. We prove the property for every ^-closed E C D. This is done by induction on the 
cardinal of £7. If is an £-class, Lemma [TU1 concludes. 

Else, let L be an £-class in E. Let 7 = {0/3} U {x G $ : o-(Qp,x) £ E\L}. By Fact EJ 
for every x < y in 7, o~(x, y) £ E \L. On can apply the induction hypothesis, and obtain a 
split s E \ L which is forward ramseyan for (7,0") and of height at most \E\ — \L\. Similarly, for 
every x < y in f3\j, o~(x, y) € L. By Lemma [191 one obtains a split s L which is forward ramseyan 
for {(5 \ 7, a) of height at most \L\. Let us define the split s E by s E (x) = s E \ L (x) + \L\ if a; £7, 
else s E (x) = s L (x) if x G P\"f. The mapping s E is forward ramseyan for E) as an inheritance 
of the forward ramseyanity of s L and s E \ L . It has height at most — l-Dl + l-D^I-E 1 !- □ 

For the proof of Theorem [61 we use Lemma [2T1 with E = D, and the same trick as for ordinal 
ramseyan splits. 

7 Compaction of additive labellings 

A labelling maps pairs of elements to a finite set (the semigroup): it is defined via a finite number 
of binary predicates. In this section we show that the use of (forward) ramseyan factorisations 
permits to encode all this information into a finite number of unary predicates. Furthermore, 
we show that the whole additive labelling can be reconstructed from those unary predicates via 
first-order formulas. We call this technique compaction. 

As above, there are two variants to the technique. One which usable over complete linear 
orderings (Section I7.ip . and one usable over ordinals, which satisfies furthermore the determin- 
isism property (Section |7.2[) . In Section P7T31 we apply this technique for proving a new result on 
monadic interpretations applied to trees. And in Section F7.4I we briefly describe how this result 
impacts on the theory of finitely presentable infinite structures. 

7.1 Compactions of additive labelling over complete linear orderings 

We prove here the following statement. 

Theorem 7. For every finite semigroup (S, .) and a in S, there exists a first-order formula 
labelling a (j;, y) of free variables x,y, which uses the ordering relation < and unary predi- 
cates px, . . . ,pn with N = \(6\S\ + 2) log 2 (|5'|)l such that the following hold^. 

For every complete linear ordering a and additive labelling a from a to S, there exists 
subsets Xi, . . . , Xn of a such that for all a in S and x < y in a: 

o-(x,y)=a iff (a,X u . . . , X N ) \= labelling^,?/) , 

in which for every i = 1 . . . N, pi is interpreted as Xi . 
We did not try to optimize the value of iV. 
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In this proof, we define first the value of X\, . . . , Xjy, before giving the formulae. 

Using Theorem El one obains a ramseyan split s for (a, a) of height at most 3 1 *S'| . To every 
element a; in a and k with 1 < k < S\S\, we furthermore attach some pieces of information 
concerning the value of a. For every k with 1 < k < 3|5|, there are two such informations, lk{x) 
and rfc(c), taking value in S, and corresponding to a compaction of what is happening to the 
left of x, and to the right of x respectively. We give the definition of lk(x), the case of rk(x) 
being symmetric. 

any value if L^{x) = 
h{x) = < a(z,x) if Lf.{x) has a maximum z 

^ else, with a such that Vy G L/.(x). 3z G Lk(x). z > y A <r(z,x) = a 

where Lfc(x) = {y < x : s(y) = A;} 

Note that a consequence of this definition is that, whenver x < y are A-neighbours, then a(x,y) = 
l s ( y )(y)- Finally, it is simple to establish that N = \(6\S\ + 2) log 2 (|5'|)] bits are sufficient for 
coding (s(x),h(x), . . . , l s{x) {x), n(x), . . . , r s{x) (x)). 

We have now to construct first-order formulas which reconstruct the value of o~(x, y) for 
every x < y in a. We do not provide the formulae explicitly, but instead describe functions 
which can be easily translated into first-order logic. Let us treat first the 'ascending case'; i.e. 
compute a(x,y) for x < y, s(x) < s(y), and s(z) > s(x) for all z in [x,y]. 

Lemma 22. For every x < y in a, if s(x) < s(y) and s(z) > s(x) for all z in [x,y], 
then a(x,y) = asc(x,y) with: 



asc(x,y) 



h(x)(y) ifs(z) > s(x) for all z £]x,y[ , 

h(x)( z )h(x){y) e ^ se f or some z £]x,y[ with s(z) = s(x) . 



Proof. Two cases can happen. If for all z in ]x, y[, s(z) > s(x). This means that [x, y[ns~ 1 (s(x)) = 
{x}. Hence, by definition, l s ( x ){y) = a(x,y). 

Else, there exists x' be in ]x, y[ns _1 (s(x)). By definition of l s ( x )(y), there exists y' in [x' , y[ns~ 1 (s(x)) 
such that l s ( x )(y) = o~(y' , y). Let now z be the one used in the definition of asc(x, y). By defini- 
tion of l s ( x )( z ), there exists z' in [x, z[ns~ 1 (s(x)) such that a(z',z) = l s ^(z). Finally using the 
ramseyanity of s, we deduce a(x,y') = a(z',z) = l s ^(z). Overall a(x,y) = a(x,y')a(y' , y) = 
h(x) (z)h(x) (y) = asc(x,y). □ 

Naturally, there is a corresponding definition for desc(x, y) satisfying cr(x, y) = desc(x, y) 
whenever s(x) > s(y) and s(z) > s(y) for all z in [x, y\. Combining asc and desc we obtain the 
following. 

Lemma 23. For every x < y in a, a(x,y) = labelling(x, y) with: 



labelling(x, y) 



asc(x,y) if s(x) < s(y) and s{z) > s(x) for all z in [x,y] 

desc(x,y) if s(x) > s(y) and s{z) > s(y) for all z in [x,y] 

^desc(x, z)asc(z,y) else, for z e]x,y[ and s(z') > s(z) for all z' G [x,y] 



Proof. There are three cases, corresponding to the three items of the definition. The two first 
one are treated by Lemma [22] and its variant for desc(x,y). In the third case, one finds z 
in ]x,y[ such that s(z) is minimum. We use Lemma [221 between x and z, and its variant for 
desc between z and y, as well as the additivity of the labelling a, for obtaining: 

a(x, y) = a(x, z)a(z, y) = asc(x, z)desc(z, y) = labelling(x, y) . 

□ 
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It is not difficult at this point to check that the definition of labelling can be translated 
for every a in S into a first-order formula labelling a using as predicate the ordering relation < as 
well as unary predicates p%, . . . ,pjy encoding the value of (s(x), h(x), . . . , l a r x \(x),r%(x), . . . , r s t x \(x 
and satisfying the conclusion of Theorem [7J 



7.2 Deterministic compaction of additive labellings over ordinals 

We now state a result similar to Theorem [7J in the ordinal case, which satisfies a form of 
determinism property. The statement in itself is difficult to process; it is similar to the statement 
of Theorem UJ in which the determinism feature has been injected. 

Theorem 8. For every finite semigroup (S, .) and a in S, there exists a first-order formula 
labelling a (x, y) of free variables x,y, which uses the ordering relation < and unary predi- 
cates p\, . . . ,pn with N = r(2|iS| + 1) log 2 (|<5|)] such that the following holds. For every ordinal 
a and additive labelling a from a to S, there exists subsets X±(a,a), . . . ,Xj\f(a,cr) of a such 
that for all a in S and x < y in a: 

cr(x,y) = a iff (a, X x (a, a), . . . , X N (a, a)) \= labelling a (x, y) , 

in which for every i = 1 . . . N, pi is interpreted as Xi(a, a). 

Furthermore, for every additive labellings a and a' over the respective ordinals a and a' , 
and every ordinal (3 < min(a,a'), 

*/ cr |/3 = a 1/3 then for all i, (3 G X,i{a,a) iff (3 G Xi(a, a') (determinism property) . 

Let s be the forward ramseyan split of (a, a) of height l^l obtained by Theorem [6J Let us 
define lk(x) as in the previous section (this time only for every k = 1 . . . \S\). Without loss of 
generality, we assume that there exists a neutral element — denote it 1 — in S, and we set for 
every x, a(x,x) = 1. Define: 

labelling(x, y) = labelling 1 (x, y) , 

with labelling™ defined by induction for all n = 1, . . . ,\S\ + 1 by: 



labelling n (x, y) 



1 ifn=\S\ + l, 

labelling n+1 (x, y) else if [x, y[ns _1 (n) = , 

labelling n+1 (x, z)l n (y) else if [x, y[ns -1 (n) = {z} , 

labelling n+1 (x, zo)l n (zi)l n (y) else if [x, y[ns _1 (n) = {z < z\ < . . . } 



In this definition, we abbreviate by [x,y[fls~ 1 (n) = {zq < z\ < ...} the fact that zq is the 
minimal element, and z\ the minimal element different from zq in [x, y[ns _1 (n). Those two 
elements exist since a is an ordinal and since the case of [x,y[(~ls _1 (n) being the emptyset or a 
singleton is treated above. 

The correctness is then stated by the following lemma. 

Lemma 24. For every x < y in a, and n = 1, . . . , \S\ + 1, if for all z in [x,y[, s(z) > n, then 

labelling™ (x, y) = o~(x,y) . 

Proof. The proof is done by a downward induction on n. For n = \S\ + 1, no z does satisfy s(z) > 
n, hence [x,y[ has to be empty. It follows that x = y, and by consequence labelling™ (x, y) = 
1 = a(x,y). 
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Else, let n < \S\. Assume the property true for n+1 and consider x < y. Let E be [x, y[ns -1 (n). 
If E is empty, this means that for all z in [x,y[, s(z) > n + 1. And by induction hypothesis 
labelling n+1 (a;, y) = a(x,y). Hence, labelling™ (x, y) = a(x,y). If E is the singleton {z}. 
This means that l n (y) = cr(z,y). It follows that labelling™ (x, y) = labelling n+1 (x, z)l n (y) = 
a(x, z)a(z,y) = a(x,y). Finaly, if E = {z$ < Z\ < ...}. By definition of l n {z\), l n (zi) = 
a(zQ,zi). By induction hypothesis, labelling n+1 (a;, zq) = ct(x,zq). Furthermore, by definition 
of l n (y), there is some z > zq such that l n (y) = y). Alltogether with the forward ramseyanity 
of s leads to: 

labelling,,, (>,y) = labelling n+1 (x, z )l n (zi)l n {y) 
= a(x,z )a(z ,z 1 )a(z,y) 
= a(x,z )a(z ,z)a(z,y) 
= <r(x,y) . 

□ 

As in the previous case, the construction is easily adaptable into a presentation by first-order 
formulas using the relation < together with N = \{2\S\ + 1) log 2 (|iS'|)] unary predicates coding 
all the possible values of (s(x), h(x), . . . , l\s\(x))- This concludes the proof of Theorem [HI 

7.3 Application to interpretations 

We prove in this section Theorem El Let us first give two lemmas which are consequences of 
standard techniques; either the compositional method, or tree automata. 

Lemma 25. Every monadic formula ${x\, . . . ,x n ) is equivalent on trees to a formula of the 
form 3zi . . .3zk-@' where is a boolean combination of monadic formulae of the form x tz 
y A !^(x,y) (of free variables x,y), \P(x) (of free variable x) and x = y, for x,y ranging in 

\X\ , • • • , X n , Z\, . . . , Zfz}- 

Lemma 26. For every monadic formula of the form x C j/A<?(x, y) of free variables x, y, there 
exists a semigroup S@ and A§ C S§ such that, for every tree t, there exists a mapping a which 
to every nodes x C y associates a(x, y) € S$, such that 

— a restricted to every branch is an additive mapping, and 

— for every nodes x C y, t \= <P(x,y) iff a(x,y) € A$. 

Furthermore, a is monadically definable: for every s G S@, there exists a monadic formula & s (x, y) 
such that for every tree t and nodes x C y, t \= <!> s (x, y) iff o~(x,y) = s. 

And the result is then the following. 

Theorem 9. For every monadic interpretation Imso> there exists a monadic marking Mmso 
and a first-order interpretation Xpo such that for every tree t, lMSo{t) = ^Fo{M-MSo{t)) . 

Proof. Wlog, we prove the result for an interpretation Imso with a single formula <&{x\, . . . , x n ^j. 
Using Lemma 125 \ we just have to show how to obtain an equivalent to a formula of the form x C 
y A y) as the combination of a monadic marking and a first-order formula. For this, we use 
Lemma [26] which tells us that the value of &(x, y) can be uncovered by projection of an additive 
labelling. And we use Theorem [SJ for reducing the computation of the additive labelling to the 
combination of a monadic marking and a first-order formula. 

Note that this argument heavily relies on the determinism of the construction of Theorem [SJ 
Indeed, one has to mark every branch of a tree, a priori with a different marking. The deter- 
minism property allows to have a single marking for the whole tree. □ 
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7.4 Consequences for infinite structures 

The goal of this section is to show how the results given above, namely Theorem [9j have direct 
new consequences in the definition of some families of finitely presentable infinite structures. 
There is no real technical contribution in this section but rather a presentation of those con- 
sequences to the theory of infinite structures. Let us warn the reader that we do not intend 
to provide a survey of this area, since this would require much more space and would be out 
of topic. We rather directly concentrate on providing Theorems HU and [T2J Essentially, those 
results show that for the standard caracterisation of the families of prefix-recognizable graphs, 
as well as for the Caucal hierarchy, one can replace the monadic interpretations by first-order 
ones. 

The prefix-recognizable graphs were introduced by Caucal via an internal definition 
Namely, fix a finite alphabet A. A prefix-recognizable graph is an infinite directed graph defined 
as follows. Its set of vertices is a regular language over the alphabet A. And each edge relation 
is a finite union of relations of the form (U x V).W with 

(U x V).W = {(uw,vw) : u G U, v € V, w € W} , 

for U, V, W regular languages. By extension, a graph is prefix recognizable if it is isomorphic to 
such a graph. An important property of those graphs is that their monadic theory is decidable 
(this fact is due to Caucal it can be easily seen as a direct consequence of Rabin Theorem |21| 
stating that the complete binary tree has a decidable monadic theory, together with Theorem [101 
below) . 

There exists different caracterisations for this class of graphs. We will use below the following 
one: 

Theorem 10 (Blumensath |3j). A graph is prefix-recognizable iff it is isomorphic to a monadic 
interpretation of the complete binary tree. 

Using this theorem as guide, one can extend the definition of prefix-recognisability to relational 
structures: we call a relational structure prefix-recognizable if it is monadically interpretable in 
the complete binary tree. 

Theorem [9] provides another — new — caracterisation of prefix-recognizable structures, 
Theorem [TTJ Beforehand, we need the following lemma. 

Lemma 27. Let t be a regular tree. Then there exists a first-order interpretation Zpo such 
that t is isomorphic to Xpo^A<i). 

Proof. It is sufficient to consider that the regular tree is the complete binary tree together with 
a regular labelling in some finite alphabet A attached to every node. This means that there 
exists a deterministic and complete finite automata A of finite words over the alphabet {0, 1}, 
with each state labelled by a letter in A, such that the label of a node u is the letter attached to 
the sole state reached from the initial state while reading u. Let this automaton have states Q, 
initial state qo, and transition function 5 from Q x {0, 1} to Q. As usual we extend this transition 
function into a mapping from Q x {0, 1}* to Q. Wlog we can assume that there exists also a 
mapping d from Q to {0, 1} such that for every state q in Q and letter a in {0, 1}, d(5(q, a)) = a; 
i.e. the automaton remembers whether the current node is a left or a right child. 

Let n be a mapping numbering the states of A from 1 to \Q\. Given a word u = a\02 ■ ■ ■ a n , 
the dj's being letters in {0, 1}, define: 

f( u ) = io n foho n(5l) i...io n (*»h 
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in which go, q%, . . . , q n are the n + 1 states successively assumed by the automaton while reading 
the word u. The proviso concerning the mapping d makes / an injective mapping. 

The image of / is first-order definable (as a language of words). Indeed, in order to verify that 
a word belongs to the image of /, it is sufficient to check a) that 10 n ^°^l is a prefix, b) that the 
last letter is 1, and c) that every factor of the form 10 n 10 m l is such that n = n(p) and m = n(q) 
for some transition 5(p, a) = q. Those verifications are first-order definable. Furthermore, for 
every word u, the state 5(qo,u) is nothing but the sole state q such that 10 n ^l is suffix of f(u). 
This is also first-order definable. 

From those remarks, it is easy to give a first-order interpretation which, given the complete 
binary tree, selects the nodes belonging to the image of /, and labels every node /(it) by the 
state S(qo, u). This interpretation provides a new tree t' . Since all the relevant information - 
the label of the node, and its right-child/left-child nature — is encoded in each state, it is easy 
to first-order interpret t in if. □ 

Theorem 11. A structure is prefix-recognizable iff it is isomorphic to the first-order interpre- 
tation (with ancestor relation) of the complete binary tree. 

Proof. We have to show that given a monadic interpretation ImsQi there exists a first-order 
interpretation Jpo such that Imso{^2) is isomorphic to Xpo^A-i). Using Theorem we have 
that Imso{^2) is equal to I' fo {Cmso{^2)) for some monadic labelling Cmso and first-order 
interpretation T' fo . Then using Lemma [271 we obtain an interpretation TpQ such that TpQ^A^ 
is isomorphic to Cmso{^2)- By closure of first-order interpretation under composition, lpo = 
Z'pQ oXpQ is a first-order interpretation such that Ifo{A2) is isomorphic to Zmso{^2)- □ 

A similar approach can be used for caracterising the Caucal hierarchy. The Caucal hierarchy 
[12j is an extension of prefix-recognizable graphs to 'higher-order'. We use here the caracterisa- 
tion of Carayol and Wohrle [8] ctS Si definition: 

— The structures in Struct® are the finite relational structures. 

— The graphs in Graph n are the structures in Struct n having a graph signature. 

— The trees in 7Vee n +i are the unfolding of graphs in Graph n . 

— The structures in Struct n +i are the monadic interpretations of trees in Tree n +i. 

Since both the monadic interpretation and the unfolding preserve the decidability of the monadic 
theory, the trees, graphs and structures in the classes defined above have a decidable monadic 
theory. 

The following interpretation shows that in the definition of this hierarchy, the monadic logic 
can be replaced by first-order logic. 

Theorem 12. The structures in Struct n are, up to isomorphism, the first-order interpretation 
of trees in Tree n . 

In fact, this is a direct consequence of Theorem [9] together with the following proposition (see 
[8], Proposition 1). 

Proposition 2. The class Tree n is closed under monadic markings. 
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